target value
Sobolev Training for Neural Networks
At the heart of deep learning we aim to use neural networks as function approximators - training them to produce outputs from inputs in emulation of a ground truth function or data creation process. In many cases we only have access to input-output pairs from the ground truth, however it is becoming more common to have access to derivatives of the target output with respect to the input -- for example when the ground truth function is itself a neural network such as in network compression or distillation. Generally these target derivatives are not computed, or are ignored. This paper introduces Sobolev Training for neural networks, which is a method for incorporating these target derivatives in addition the to target values while training.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > United States > Colorado > Denver County > Denver (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Supplementary information 1 Simulation parameters
All simulations were based on pytorch [5]. For the nonlinear neuroscience tasks, we applied the gradient descent method "Adam" [4] to the recurrent weights W as well as to the input and output vectors mi, wi. We checked that our results did not depend qualitatively on the choice of the "Adam" algorithm over plain gradient descent; however, training converged more easily for this choice of algorithm. We also checked that restricting training to W only (as for the simple model) did not alter our results qualitatively (although, with this restriction, training on the Romo task for small values of g did not converge). Code for reproducing our results can be found on https://github.com/frschu/neurips_